-
-
Notifications
You must be signed in to change notification settings - Fork 6
feat:OpenAPI refactor: shared schemas, EvalRunOutputItemResult, remove -2 #227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughRefactors and expands OpenAPI spec in src/libs/tryAGI.OpenAI/openapi.yaml by adding shared schemas/enums, updating references to use them, introducing EvalRunOutputItemResult, extending ScoreModelGrader sampling_params, updating examples/descriptions, and removing legacy “-2” schema variants. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Client
participant API as Eval Runs API
participant Grader as Grader Engine
Client->>API: POST /eval-runs/{id}/execute
API->>Grader: Evaluate output item(s)
Grader-->>API: Result(s) per item (EvalRunOutputItemResult[])
API-->>Client: 200 OK with results and usage (SearchContextSize)
note over API,Grader: Sampling params may include max_completions_tokens and reasoning_effort
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (3)
src/libs/tryAGI.OpenAI/openapi.yaml (3)
9979-9982: Avoid null-only schema; align with chosen OAS version.
type: 'null'plusnullable: trueis tool-fragile (OAS 3.0 disallowstype: null; OAS 3.1 deprecatesnullable). If the field is truly always null, drop it. If it’s optional, model as a union.Option A (remove property if always null):
- status: - type: 'null' - nullable: trueOption B (union; OAS 3.1 style):
- status: - type: 'null' - nullable: true + status: + anyOf: + - { type: 'null' } + - { type: string }Run a quick validation with your chosen linter to confirm compatibility.
18410-18413: Minor: description spacing can confuse formatters.The multi-line string contains irregular spacing; non-blocking.
- description: "Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard.\n Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters." + description: "Set of up to 16 key–value pairs attached to an object. Useful for storing additional structured metadata and querying via API or dashboard. Keys: max 64 chars. Values: max 512 chars."
13876-13881: Consolidate duplicate enums — remove DetailEnum and reuse ImageDetail.DetailEnum duplicates ImageDetail (low/high/auto). Remove DetailEnum from the OpenAPI spec, update any $ref to use ImageDetail, then regenerate the client code to remove the generated DetailEnum artifacts.
Definitions/targets: src/libs/tryAGI.OpenAI/openapi.yaml:13876 (DetailEnum) and src/libs/tryAGI.OpenAI/openapi.yaml:16102 (ImageDetail). Generated files referencing DetailEnum: src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.DetailEnum.g.cs, src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.DetailEnum*.g.cs, and entries in JsonSerializerContextTypes.g.cs.
Proposed removal:
- DetailEnum: - enum: - - low - - high - - auto - type: string
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (99)
src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI..JsonSerializerContext.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.ConversationsClient.UpdateConversation.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.IConversationsClient.UpdateConversation.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.Annotation2.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.ComputerCallOutputItemParamStatus.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.ComputerCallOutputItemParamStatusNullable.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.ComputerEnvironment1.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.ComputerEnvironment1Nullable.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.ComputerUsePreviewToolEnvironment.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.ComputerUsePreviewToolEnvironmentNullable.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.ContainerFileCitationBody2Type.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.ContainerFileCitationBody2TypeNullable.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.ContentItem.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.DetailEnum.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.DetailEnumNullable.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.FileCitationBody2Type.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.FunctionCallItemStatus.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.FunctionCallItemStatusNullable.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.ImageDetail.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.ImageDetailNullable.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.InputFileContent2Type.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.InputFileContent2TypeNullable.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.InputImageContent2Detail.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.InputImageContent2DetailNullable.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.InputImageContent2Type.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.InputImageContent2TypeNullable.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.InputImageContentDetail.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.InputImageContentDetailNullable.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.OutputTextContent2Type.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.OutputTextContent2TypeNullable.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.RankerVersionType.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.RankerVersionTypeNullable.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.SearchContextSize.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.SearchContextSizeNullable.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.WebSearchPreviewToolSearchContextSize.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.WebSearchPreviewToolSearchContextSizeNullable.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonSerializerContextTypes.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.Annotation2.Json.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.Annotation2.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.Annotation2Discriminator.Json.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.Annotation2Discriminator.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.ComputerCallOutputItemParam.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.ComputerCallOutputItemParamStatus.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.ComputerEnvironment1.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.ComputerScreenshotContent.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.ComputerUsePreviewTool.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.ContainerFileCitationBody2.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.ContainerFileCitationBody2Type.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.ContentItem.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.ConversationItem.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.CreateEvalCompletionsRunDataSourceSamplingParams.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.CreateEvalResponsesRunDataSourceSamplingParams.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.DetailEnum.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.EvalRunOutputItem.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.EvalRunOutputItemResult.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.EvalRunOutputItemResultSample.Json.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.EvalRunOutputItemResultSample.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.FileCitationBody2.Json.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.FileCitationBody2.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.FunctionCallItemStatus.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.FunctionCallOutputItemParam.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.GraderScoreModel.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.GraderScoreModelSamplingParams.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.ImageDetail.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.InputFileContent2.Json.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.InputFileContent2.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.InputFileContent2Type.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.InputImageContent.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.InputImageContent2.Json.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.InputImageContent2.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.InputImageContent2Type.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.InputTextContent2.Json.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.InputTextContent2.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.InputTextContent2Type.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.LogProb2.Json.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.LogProb2.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.Message.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.MessageRole.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.MessageStatus.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.MetadataParam.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.OutputTextContent2.Json.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.OutputTextContent2.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.OutputTextContent2Type.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.RankerVersionType.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.RankingOptions.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.RefusalContent2.Json.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.RefusalContent2.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.RefusalContent2Type.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.SearchContextSize.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.SummaryTextContent.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.TextContent.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.TopLogProb2.Json.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.TopLogProb2.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.UpdateConversationBody.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.UrlCitationBody2.Json.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.UrlCitationBody2.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.UrlCitationBody2Type.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.WebSearchPreviewTool.g.csis excluded by!**/generated/**src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.WebSearchPreviewToolSearchContextSize.g.csis excluded by!**/generated/**
📒 Files selected for processing (1)
src/libs/tryAGI.OpenAI/openapi.yaml(23 hunks)
🔇 Additional comments (16)
src/libs/tryAGI.OpenAI/openapi.yaml (16)
10037-10037: LGTM: clearer description.
10174-10176: LGTM: shared enum ref for environment reduces drift.
11434-11435: LGTM: add reasoning_effort alongside max_completion_tokens.
11711-11712: LGTM: mirrored reasoning_effort here too.
14631-14632: LGTM: results now reference a reusable result schema.
15663-15667: LGTM on introducing a status enum; ensure usages match.Confirm all
statusproperties that should use this enum actually reference it (see Line 15690 below).
16102-16107: LGTM: ImageDetail enum centralizes detail values.
16686-16686: LGTM: refer to ImageDetail in InputFileContent.
17772-17782: LGTM: Message uses shared role/status enums; nice cleanup and docs.
18304-18320: LGTM: Public MessageRole/MessageStatus look sane; keep values stable.Adding/removing enum values is breaking for many clients—treat changes here as versioned.
Consider documenting stability guarantees in release notes.
19432-19441: LGTM: externalize ranker versions; document stability of dated identifiers.Add a note on how
default-2024-11-15evolves and whetherautois recommended for long-lived clients.
28135-28140: LGTM: SearchContextSize enum clarifies allowed values.
28323-28323: LGTM: clearer summary description.
28351-28351: LGTM: clearer text content description.
30076-30079: LGTM: reference shared SearchContextSize.
17757-17765: LGTM — verification passed
- No $ref to "#/components/schemas/*-2" was found.
- RefusalContent, InputTextContent, OutputTextContent, InputImageContent, and InputFileContent are each defined exactly once.
| ComputerEnvironment1: | ||
| enum: | ||
| - windows | ||
| - mac | ||
| - linux | ||
| - ubuntu | ||
| - browser | ||
| type: string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ubuntu vs Linux: adding a new enum value can break clients.
Including both linux and ubuntu introduces overlap and may be a breaking change for strict consumers. Recommend omitting ubuntu or documenting/versioning this change.
Apply:
ComputerEnvironment1:
enum:
- windows
- mac
- - linux
- - ubuntu
+ - linux
- browser
type: stringCommittable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In src/libs/tryAGI.OpenAI/openapi.yaml around lines 10006 to 10013, the new enum
value "ubuntu" overlaps with existing "linux" and can break strict clients;
remove the "ubuntu" enum entry from the schema (or alternatively revert this
change and add a clear semver-major version bump and changelog note if you must
keep it), update any example/usage docs to reflect the retained values, and add
a release note documenting the enum change and the required client
impact/versioning decision.
| EvalRunOutputItemResult: | ||
| title: EvalRunOutputItemResult | ||
| required: | ||
| - name | ||
| - score | ||
| - passed | ||
| type: object | ||
| properties: | ||
| name: | ||
| type: string | ||
| description: The name of the grader. | ||
| passed: | ||
| type: boolean | ||
| description: Whether the grader considered the output a pass. | ||
| sample: | ||
| type: object | ||
| description: Optional sample or intermediate data produced by the grader. | ||
| nullable: true | ||
| score: | ||
| type: number | ||
| description: The numeric score produced by the grader. | ||
| type: | ||
| type: string | ||
| description: 'The grader type (for example, "string-check-grader").' | ||
| description: "A single grader result for an evaluation run output item.\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Result schema includes “sample” which duplicates parent; shape mismatch with example.
EvalRunOutputItemResult.sample conflicts with the top-level output_item.sample shown in the example. Keep sample only at one level (prefer parent), or clearly differentiate (e.g., debug_sample).
Apply:
EvalRunOutputItemResult:
title: EvalRunOutputItemResult
required:
- name
- score
- passed
type: object
properties:
name:
type: string
description: The name of the grader.
passed:
type: boolean
description: Whether the grader considered the output a pass.
- sample:
- type: object
- description: Optional sample or intermediate data produced by the grader.
- nullable: true
score:
type: number
description: The numeric score produced by the grader.
type:
type: string
description: 'The grader type (for example, "string-check-grader").'📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| EvalRunOutputItemResult: | |
| title: EvalRunOutputItemResult | |
| required: | |
| - name | |
| - score | |
| - passed | |
| type: object | |
| properties: | |
| name: | |
| type: string | |
| description: The name of the grader. | |
| passed: | |
| type: boolean | |
| description: Whether the grader considered the output a pass. | |
| sample: | |
| type: object | |
| description: Optional sample or intermediate data produced by the grader. | |
| nullable: true | |
| score: | |
| type: number | |
| description: The numeric score produced by the grader. | |
| type: | |
| type: string | |
| description: 'The grader type (for example, "string-check-grader").' | |
| description: "A single grader result for an evaluation run output item.\n" | |
| EvalRunOutputItemResult: | |
| title: EvalRunOutputItemResult | |
| required: | |
| - name | |
| - score | |
| - passed | |
| type: object | |
| properties: | |
| name: | |
| type: string | |
| description: The name of the grader. | |
| passed: | |
| type: boolean | |
| description: Whether the grader considered the output a pass. | |
| score: | |
| type: number | |
| description: The numeric score produced by the grader. | |
| type: | |
| type: string | |
| description: 'The grader type (for example, "string-check-grader").' |
🤖 Prompt for AI Agents
In src/libs/tryAGI.OpenAI/openapi.yaml around lines 14764 to 14788, the schema
defines EvalRunOutputItemResult.sample which duplicates the parent
output_item.sample and the example shows a different shape; remove the nested
sample field or rename it (e.g., debug_sample) and update its schema and example
to match the new name or eliminate it so only the parent sample remains; update
any $ref/usages and the example objects to reflect the single canonical sample
location, and run schema validation to ensure shapes match the example.
| type: 'null' | ||
| nullable: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Status modeled as null conflicts with new enum.
If FunctionCallItemStatus (Lines 15663–15667) is authoritative, allow union with null rather than null-only.
- status:
- type: 'null'
- nullable: true
+ status:
+ anyOf:
+ - $ref: '#/components/schemas/FunctionCallItemStatus'
+ - { type: 'null' }Align with your OAS version (use nullable only if staying on 3.0).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| type: 'null' | |
| nullable: true | |
| status: | |
| anyOf: | |
| - $ref: '#/components/schemas/FunctionCallItemStatus' | |
| - { type: 'null' } |
🤖 Prompt for AI Agents
In src/libs/tryAGI.OpenAI/openapi.yaml around lines 15690-15691, the schema
currently models a null-only status which conflicts with the
FunctionCallItemStatus enum defined at lines 15663–15667; update the schema to
allow the enum values or null (i.e., a union) instead of null-only—if your
OpenAPI version is 3.1+, represent this as a oneOf/anyOf with the enum schema
and a {"type":"null"}, or if you remain on 3.0.x use the nullable: true flag on
the enum schema so the enum can also be null; ensure the final representation
matches the file's OAS version.
| properties: | ||
| max_completions_tokens: | ||
| minimum: 1 | ||
| type: integer | ||
| description: "The maximum number of tokens the grader model may generate in its response.\n" | ||
| nullable: true | ||
| reasoning_effort: | ||
| $ref: '#/components/schemas/ReasoningEffort' | ||
| seed: | ||
| type: integer | ||
| description: "A seed value to initialize the randomness, during sampling.\n" | ||
| nullable: true | ||
| temperature: | ||
| type: number | ||
| description: "A higher temperature increases randomness in the outputs.\n" | ||
| nullable: true | ||
| top_p: | ||
| type: number | ||
| description: "An alternative to temperature for nucleus sampling; 1.0 includes all tokens.\n" | ||
| default: 1 | ||
| nullable: true | ||
| example: 1 | ||
| description: The sampling parameters for the model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent property name: max_completions_tokens vs max_completion_tokens.
Elsewhere you use max_completion_tokens. This pluralization drift will break clients.
sampling_params:
type: object
properties:
- max_completions_tokens:
+ max_completion_tokens:
minimum: 1
type: integer
description: "The maximum number of tokens the grader model may generate in its response.\n"
nullable: true
reasoning_effort:
$ref: '#/components/schemas/ReasoningEffort'Add a deprecated alias only if you must support both for a transition.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| properties: | |
| max_completions_tokens: | |
| minimum: 1 | |
| type: integer | |
| description: "The maximum number of tokens the grader model may generate in its response.\n" | |
| nullable: true | |
| reasoning_effort: | |
| $ref: '#/components/schemas/ReasoningEffort' | |
| seed: | |
| type: integer | |
| description: "A seed value to initialize the randomness, during sampling.\n" | |
| nullable: true | |
| temperature: | |
| type: number | |
| description: "A higher temperature increases randomness in the outputs.\n" | |
| nullable: true | |
| top_p: | |
| type: number | |
| description: "An alternative to temperature for nucleus sampling; 1.0 includes all tokens.\n" | |
| default: 1 | |
| nullable: true | |
| example: 1 | |
| description: The sampling parameters for the model. | |
| sampling_params: | |
| type: object | |
| properties: | |
| max_completion_tokens: | |
| minimum: 1 | |
| type: integer | |
| description: "The maximum number of tokens the grader model may generate in its response.\n" | |
| nullable: true | |
| reasoning_effort: | |
| $ref: '#/components/schemas/ReasoningEffort' |
🤖 Prompt for AI Agents
In src/libs/tryAGI.OpenAI/openapi.yaml around lines 15971 to 15993, the schema
uses the inconsistent property name max_completions_tokens; change the canonical
property to max_completion_tokens (singular "completion") and update all
references/examples to that name, and if you must support the old name during a
transition add max_completions_tokens as an alias property that has the exact
same schema type/constraints but is marked deprecated (add deprecated: true and
a description pointing users to max_completion_tokens) so both parse the same
value while signaling clients to migrate.
| description: "A ScoreModelGrader object that uses a model to assign a score to the input.\n" | ||
| x-oaiMeta: | ||
| example: "{\n \"type\": \"score_model\",\n \"name\": \"Example score model grader\",\n \"input\": [\n {\n \"role\": \"user\",\n \"content\": (\n \"Score how close the reference answer is to the model answer. Score 1.0 if they are the same and 0.0 if they are different.\"\n \" Return just a floating point score\\n\\n\"\n \" Reference answer: {{item.label}}\\n\\n\"\n \" Model answer: {{sample.output_text}}\"\n ),\n }\n ],\n \"model\": \"gpt-4o-2024-08-06\",\n \"sampling_params\": {\n \"temperature\": 1,\n \"top_p\": 1,\n \"seed\": 42,\n },\n}\n" | ||
| example: "{\n \"type\": \"score_model\",\n \"name\": \"Example score model grader\",\n \"input\": [\n {\n \"role\": \"user\",\n \"content\": (\n \"Score how close the reference answer is to the model answer. Score 1.0 if they are the same and 0.0 if they are different.\"\n \" Return just a floating point score\\n\\n\"\n \" Reference answer: {{item.label}}\\n\\n\"\n \" Model answer: {{sample.output_text}}\"\n ),\n }\n ],\n \"model\": \"o4-mini-2025-04-16\",\n \"sampling_params\": {\n \"temperature\": 1,\n \"top_p\": 1,\n \"seed\": 42,\n \"max_completions_tokens\": 32768,\n \"reasoning_effort\": \"medium\"\n },\n}\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Fix example to match the canonical key and verify model id.
Update example key to max_completion_tokens. Also verify o4-mini-2025-04-16 is the correct, GA model identifier.
- "max_completions_tokens": 32768,
+ "max_completion_tokens": 32768,🌐 Web query:
Is "o4-mini-2025-04-16" the correct, currently available OpenAI model id for scoring/graders?
💡 Result:
Short answer: No.
Details: o4-mini-2025-04-16 is a real OpenAI model released April 16, 2025. However Microsoft/Azure's grader (score_model) docs list supported grader model ids as gpt-4o-2024-08-06 and o3-mini-2025-01-31 — they do not list o4-mini-2025-04-16 as a supported grader. [1][2]
Sources:
- OpenAI announcement for o3 and o4-mini (release date Apr 16, 2025). [1]
- Azure OpenAI / Azure AI Foundry docs showing supported grader models (gpt-4o-2024-08-06, o3-mini-2025-01-31). [2]
Use canonical key and replace unsupported grader model
File: src/libs/tryAGI.OpenAI/openapi.yaml — update example key and replace the grader model id: Azure's grader docs list gpt-4o-2024-08-06 and o3-mini-2025-01-31 as supported; o4-mini-2025-04-16 is not listed.
- "max_completions_tokens": 32768,
+ "max_completion_tokens": 32768,📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| example: "{\n \"type\": \"score_model\",\n \"name\": \"Example score model grader\",\n \"input\": [\n {\n \"role\": \"user\",\n \"content\": (\n \"Score how close the reference answer is to the model answer. Score 1.0 if they are the same and 0.0 if they are different.\"\n \" Return just a floating point score\\n\\n\"\n \" Reference answer: {{item.label}}\\n\\n\"\n \" Model answer: {{sample.output_text}}\"\n ),\n }\n ],\n \"model\": \"o4-mini-2025-04-16\",\n \"sampling_params\": {\n \"temperature\": 1,\n \"top_p\": 1,\n \"seed\": 42,\n \"max_completions_tokens\": 32768,\n \"reasoning_effort\": \"medium\"\n },\n}\n" | |
| "max_completion_tokens": 32768, |
🤖 Prompt for AI Agents
In src/libs/tryAGI.OpenAI/openapi.yaml around line 16002, the example uses a
non-canonical API key placeholder and an unsupported grader model id; replace
the example key with the project's canonical placeholder (e.g., OPENAI_API_KEY
or the established project key name) and swap the grader model id to one of
Azure's supported models such as gpt-4o-2024-08-06 or o3-mini-2025-01-31 in the
example, ensuring the value and model id match the documented Azure grader
options.
Summary by CodeRabbit
New Features
Refactor
Documentation